Internet Info 1997 December

home *** CD-ROM | disk | FTP | other *** search

/ Internet Info 1997 December / Internet_Info_CD-ROM_Walnut_Creek_December_1997.iso / ietf / urn / urn-archives / urn-ietf.archive.9608 / 000007_owner-urn-ietf _Tue Aug 13 02:29:19 1996.msg < prev next >

Wrap

Internet Message Format | 1997-02-19 | 31KB

Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id CAA09508 for urn-ietf-out; Tue, 13 Aug 1996 02:29:19 -0400 Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id CAA09501 for <urn-ietf@services.bunyip.com>; Tue, 13 Aug 1996 02:29:13 -0400 Received: from mintaka.lcs.mit.edu by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA08711 (mail destined for urn-ietf@services.bunyip.com); Tue, 13 Aug 96 02:29:07 -0400 Received: from skadhwe.lcs.mit.edu by MINTAKA.LCS.MIT.EDU id aa23324; 13 Aug 96 2:28 EDT Received: by skadhwe.lcs.mit.edu; (5.65/1.1.8.2/15Aug95-0306PM) id AA03241; Tue, 13 Aug 1996 02:28:54 -0400 Date: Tue, 13 Aug 1996 02:28:54 -0400 Message-Id: <9608130628.AA03241@skadhwe.lcs.mit.edu> From: Lewis Girod <girod@LCS.MIT.EDU> To: rdaniel@acl.lanl.gov Cc: urn-ietf@bunyip.com In-Reply-To: <2.2.32.19960802212958.006cb04c@acl.lanl.gov> (message from Ron Daniel on Fri, 02 Aug 1996 15:29:58 -0600) Subject: Re: [URN] nasty rewriting rules Sender: owner-urn-ietf@services.bunyip.com Precedence: bulk Reply-To: Lewis Girod <girod@LCS.MIT.EDU> Errors-To: owner-urn-ietf@bunyip.com ******** NOTE: ******** * * This reply is written in reference to the document and code at * * http://ana-www.lcs.mit.edu/people/girod/translator.c (~40K) * * The file is mostly document, and explains in great detail with examples the * proposal discussed in this reply. * *********************** On Fri, 02 Aug 1996 15:29:58 -0600, Ron Daniel <rdaniel@acl.lanl.gov> wrote: So, what *do* we want to avoid about regexps? [...] You said that your primary objection to them was that adopting them now might prevent future resolution systems that are not also based on general rewrite rules. With respect, I think that reason is a red herring. A1. What do we need to fix about Regexps? I didn't spend a lot of time explaining this clearly in the original proposal document, so I will summarize what I thought should be fixed: (1) We should enable specific namespace schemes to be enforceable. I think this is a better way of putting what I had said originally, i.e. preventing future name schemes from being contorted. While I agree that it would be a patently bad idea to _specify_ a contorted name scheme, with regexp rules at the basis of a resolution system the whole idea of a name scheme specification no longer applies. At any level of delegation some part of a URN is likely to be essentially opaque, and lower levels really can't be stopped from varying from the intended scheme. This can be seen as a bug or a feature; I consider it a bug. To me it is better to explicitly define a scheme and relegate things that do not fit to other schemes (see discussion in below in sections A4.1 and A3). You mentioned that one difficulty might be ``constructed OIDs'', which I am not familiar with... I would suspect that these could be just as easily constructed into a different name scheme (perhaps by prepending URN:oid-xyz: or something) rather than changing the previously defined format of OIDs. Further, part of this problem is caused by the present lack of a resolution system; once one exists people will make namespaces that fit it. How are constructed OIDs currently used and resolved? (2) We should simplify the construction of maintenance interfaces. Maintainers need to understand what is going on with the system and need to keep the rules working as new functionality is added to them. It would be nice if this can be done with a simple interface which could display the data in an easy to follow manner and which provides static and end-to-end checks to ensure that the system is working. Such an interface is sketched out below in section A2. The regexps are hard to follow and at each stage the whole string has to be processed to figure out what part is relevant. This makes them harder to fit into an interface. (3) We should try to keep the bulk of the data in a format that is easily convertible in the future. This might be a feature of the maintainance interface, since the maintenance interface could keep the data in a canonical form which is then compiled into rules (see section A2.3). It seems to me that this sort of processing is much more difficult to do with regexps than with a simpler rule language. But to the degree that name scheme hierarchy can be specified (depending on the details of each specific scheme) it may be possible to have a single canonical data format for an entire name scheme. (4) Regexp interpreters are hard to implement if they are not already available on the target system, whereas the languages in my proposal have been implemented quite easily. Currently, there are a variety of naming schemes in existence that I would like to be able to resolve as URNs. (ISBNs, OIDs, FPIs, etc). The structure of those names is already set, and the ability of future systems to crack that structure is not affected by what we allow in NAPTRs. There are two questions here. First, this statement is only true given that the structure of the name scheme actually remains consistent with this set stucture in the context of NAPTR (this gets back to point (1) above). If they were resolved with NAPTR using regexps little would prevent someone from changing the format of an ISBN in various ways; there is no technical impediment to this, and if the top level ISBN naming authority doesn't want it to happen their only recourse is legal. Second, I am curious to know how NAPTR would resolve these three name schemes. I can see how to do ISBN (if there are -'s in it), and OIDs shouldn't be a problem, even constructed ones as long as the constructions are in the set of things regexp rules can handle. Similarly, both of these can be dealt with by my proposal, although the set of interpretable constructions is more restricted. Since FPIs are the other space we want to handle, has anyone figured out how regexps would be used to resolve them? From what I can tell it seems tricky. I certainly don't want us to do something now with NAPTRs that is going to come back to haunt us. [...] 1) The distinction between the "program" that is used to canonicalize the order of any hierarchy in the URN, and the "rules" that are used to yank things off the front and make the next domain name, needs to be more explicit. Only the people at urn.net need to know a lot about the program. Site administrators only need to write the rules. Putting them both into the same RR may be confusing. This is true. I grafted them into the same record only to show how it fit into the original NAPTR framework. In fact, in section A2.1 I suggest using three records. 2) What I thought was one of the ameliorating influences in the NAPTR proposal was that the regexps would be applied to the original URN in its entirety. I think this is a much simpler conceptual model than your two-part approach, especially when the second part is rules that operate on the output of other rules. 3) One of the nasty bits about iterative rewrite rules is that upstream changes can affect the correctness of my rules. I forsee a class of errors arising where delimiters are not handled uniformly in upstream rules, making life a pain in the butt for downstream administrators. (Can you get rid of the + and - rules for handling delimiters and just say that if a character is matched, it is eaten?) 4) While error-prone, regexps are nonetheless familiar to a large number of people. The syntax of your proposal (which I understand you can change) does not seem to be an improvement on regexps, and is familiar to no-one. 5) "rules" in your model have two outputs - the domain name to be looked up and the tail of the canonicalized string to be left for subsequent rules. This is a bit warty, but seems unavoidable in rules that are meant to be applied to the output of other rules. I think that all four of these points can be addressed at the level of management software. Because of the simplification of the rules involved, the management software is easier to specify, and with a few added features I think it can ameliorate the issues with invisible breakage downstream. A2. A General Framework for Management Software A2.1 Records The management software described here is used to create and update a set of DNS records on a given server. There are four types of record involved here: (1) SRV records, providing pointers to terminal servers (2) NAPTR records, providing an order parameter and a ruleset decribing how to generate the next domain name (3) NIDSPEC records, which provide for a given name scheme a translation program for rendering URNs canonical and an initial ruleset that generates the next domain name (4) NADOC records, which contain documentation corresponding to a given NIDSPEC or NAPTR, as described below. The description of this management software is very sketchy right now but you should be able to see the intent and how it might work. I think there are some clever simplifications that can be made to the design (especially with respect to the storage and signing of ``agreements'') but I haven't thought it through enough to see them. I offer this mainly as a proof of concept, not as a final solution. To explain how this management software works, let us consider an example. A2.2 Contracts Between Clients and Administrators At a site below the top level, management is taking place over a collection of data that is referenced by domain name. Suppose our site is foo.com, and they are in the business of being a ``name authority'' (or at any rate providing one step of resolution for a name authority); then xxx.foo.com retrieves data (NAPTR, NADOC, SRV records, etc.) from foo.com's name server. The purpose of the management software is to generate foo.com's name database based on their clients' needs. The source data contains two logical types of record: (1) client-instance descriptors, which describe rules that have a fixed set of valid operations. For example, a rule might apply only to two specific clients who can be listed explicitly: the rule ``m"girod/"*p"edu.mit.lcs.skadhwe";'' would only apply to my URNs and would direct clients only to my machine. Other rules might apply to two or a hundred clients. (2) client-class descriptors, which describe a rule that is applied to an unspecified set of clients. For example, for a rule that locates a domain name in the text of the URN and appends ``.urn.net'', it might not be convenient to list all possible clients. The format of a client-instance descriptor is as follows: client-instance { client agreement list = list of client-agreement order = integer ruleset = string // in user-freindly form (see below) } client-agreement { // specification of expected input other info about referer (parent) = (implementation specific) domain name reference came from = string sample expected tail of canonical URN from above = string // specification of promised output other info about client = (implementation specific) domain name of client referring to = string tail expected by client at above domain name = string } Client-class descriptors need not be implemented separately; they can be implemented as trivial instances of client-instance descriptors by simply listing any relevant examples. These descriptors would be set up as a result of an agreement between the naming authority and the clients, and would be fixed. The source data is stored in NADOC records and can be retrieved as such by the clients. In the NADOC record the agreement (excepting the rule specification) can be signed by the agreeing parties so that the agreement used to check the rule can be verified. The site administrator can update the rules, etc, but before the database is actually updated, the software runs a static check against the input/output specifications to make sure they work. If so the database is modified and an end-to-end test is made. The explicit specification of agreements provides safety against the breakage problem and the data involved can be kept in a fairly canonical form (i.e. sets of client-instance records) which should be easier to migrate to new systems. A2.3 Canonical Data Storage for Automated Generation of Contracts For some name schemes, it will be possible to have an even more foolproof system, one which generates the rules directly from a lexical list of namespaces. This is easiest in name schemes with simple syntax. A list of namespaces (using the model of delegation specified in section 4 of our I-D, which is similar to that of the ``path'' scheme) such as urn:bar:foo/a/* --> a.com urn:bar:foo/c/* --> c.com urn:bar:foo/c/fg/* --> fg.com urn:bar:foo/x/* --> x.com urn:bar:foo/x/y/z/* --> z.com urn:bar:foo/x/yz/* --> yz.com could be used to automatically generate all of the rules and agreements involved, perhaps assuming that the tail string coming in begins right after ``urn:bar:foo/'' has been eaten (because at the previous level there was a namespace definition ``urn:bar:foo/* --> foo.com''). The various clients involved sign the agreements, and when changes are made most of the agreements stay the same, and any that change will need to be signed again. As a rule the agreements tend to remain pretty constant; mostly rules are added over time, not changed. A2.4 A User Friendly Rule Syntax Another important feature of this software should be a small rule compiler that translates an easy-to-understand language into the terse format transmitted to the browsers. This should make it easier for people to learn and program in the ``language'' (if such a simple thing can be called a language!!) while at the same time making syntax errors less likely. For example, we could use S-expressions (pardon my rough adherence to standard BNF form..): <RULESET> = ( *<STMT> ) <STMT> = ( eat-until <DELSET> <DISPOSITION> ) | ( eat-including <DELSET> <DISPOSITION> ) | ( eat-x-chars <INTEGER> <DISPOSITION> ) | ( match-prefix <STRING> <DISPOSITION> ) | ( match-rest <STRING> <DISPOSITION> ) <DELSET> = <STRING> <DISPOSITION> = &replace-with <STRING> | © So for example, given a canonical form email address mail:edu.mit.lcs@girod ((eat-including ":" &replace-with "") (eat-until "@" ©) (eat-including "@" &replace-with ".mail-urn")) Would ``compile'' to ``x":"+p"";x"@"-v;x"@"+p".mail-urn";'', and would generate the pair ("mail-urn.lcs.mit.edu", "girod"). Note that the translation took care of removing the ``urn:'' if it was there. This or something like it should be simple enough to learn. 6) I am not so sure that your matching rules will continue to be simpler than the major ideas in regexps. Point 3 showed one case where optionality would be extremely helpful, so that upstream changes in matching on a delimiter would not kill us. The religious debate on "urn:" is another. FPIs are a namespace this group is supposed to try to accommodate. They use :: and // in broadly equivalent ways. So, alternation is another capability that probably needs to be in the rewriting language. If those go into your language, you are well on the way to regexps. At this time I have chosen to attack the upstream breakage problem differently; the URN: issue should be handled at the level of the canonical translation and won't affect the rulesets. In section 5 of the proposal document I discuss the issues that I see surrounding ``URN:''; I think that we should make an effort to resolve this debate sometime soon. FPIs are as yet still an unknown here, and indeed they may very well be difficult to handle. I was hoping to figure out how to do so after seeing a general plan for doing so with rewrite rules, mainly because it seems possible to me that the NAPTR system might be used to handle some sections of that namespace while relying on other mechanisms to handle the remainder. Another concern that I have is the growth of the number of rewrite rules over time as portions of collections are delegated to other agencies. This is a concern about rewriting, and applies to either your scheme or to regexps. This is true, and I see this as the central reason for making sure that NAPTR can be replaced by something that will still be efficient when partial delegations become more the rule than the exception. It is my opinion that the translation portion of this proposal will come in handy when a new system is implemented. A3. How Canonical Tranlsation Fits into the Future In the future, I think canonical translation will be a very useful thing. The main reason is that if the direction of hierarchy is consistent, information about prefixes can be cached (i.e. classes of URNs that share the same resolution information). Without a canonical hierarchy such cache systems would need to understand the hierarchy format of any name scheme from which prefixes will be cached, and URNs for which the hierarchy is not understood could only be cached individually. To my way of thinking the logical place to do this transform is at the client (if that is in fact possible..) so that URNs can be handled consistently within the resolution/caching mechanism until the terminal resolver is reached (which should operate on the original URN). This is a URN framework issue. Ignoring for the moment the question of whether name schemes should in fact be specified (see discussion in A4.1), a given translation language has a finite and _definite_ capacity to render URNs canonical. If we make a choice and specify a language it enables a system to process a subset of possible URNs in a consistent manner. Other schemes for URNs would need to be handled by a different system, which could be implemented later with a more capable translation language encompassing a different set of URNs, and so on. In the short term, the initial translation language would be coded into the browser, and it might be wise for browser implementations to leave room to plug new functionality into the translation language. My guess is that a translation language capable of dealing with the existing namespaces will provide sufficient flexibility that new namespaces will have no trouble fitting into the system. What I would like to see is some more analysis of what capabilities are needed (like the stuff above on why I think alternation and optionality are needed). A4. What Capabilities are Needed and Why? As we have seen, the basic theory in terms of what is needed is that by folding stuff into the translation layer the rules layer doesn't need much functionality at all. A4.1 Specification of Hierarchy on a Per-Name-Scheme Basis One potential problem with this proposal is its presupposition that the hierarchy for a name scheme can be specified, which may or may not be valid (i.e. constructed OIDs.) I think we need to establish whether this is a realistic problem. Given that there are namespaces that cannot be rendered canonical in a single preprocessing step, can we estimate the negative impact of escaping to another mechanism at the level at which the specification no longer holds (in the case of OIDs many of them would be in canonical form already)? One possible ameliorating factor here is that because the translation layer represents a partial specification of name scheme syntax, there may be a subsequent reduction in the bother associated with allowing new name schemes to be developed (the framework document suggested that there should be standards for admission, etc), does this then imply that names (such as constructed OIDs) that violate their ``default'' specification might instead be created in a (possibly new) name scheme for which they are structured according to spec? I am fairly convinced that the benefit gained by the canonical hierarchy well outweighs the problems it causes. The central benefits, which I have touched on elsewhere in this message, are (1) Simpler rules in the NAPTR scheme (2) Makes it easier for management software to store data canonically (3) Makes caching easier for future systems (4) Allows name schemes to specify a hierarchy and enables technical solutions to enforce that specification to a much greater degree The problems are: (1) Some existing namespaces may be incompatible and will need to escape to separate resolution technologies (2) Some future namespaces may be incompatible or may be compelled to maintain compatibility The question is: how many URNs and namespaces are we talking about, and is it a problem for them to make the (possibly minor) modifications required to be compatible? Assuming for the moment that the name schemes can be sufficiently specified, the next question is: what sorts of things will we need to handle after canonicalization and how powerful does the rule language have to be. Ron brought up optionality and alternation as two things we will need. In many cases this can be handled with the language currently specified. A4.2 Support for Alternation If we want to pull off a token delimited by either :: or // we can determine which of these cases we have using the match construct. For example, if the following two rules are tried on "foo::bar" and "foo//bar": ((eat-until ":/" ©) (match-prefix "::" &replace-with ".") (eat-until "" ©) (eat-x-chars 0 &replace-with ".colons")) ((eat-until ":/" ©) (match-prefix "//" &replace-with ".") (eat-until "" ©) (eat-x-chars 0 &replace-with ".slashes")) the results will be ("colons.bar.foo", "") and ("slashes.bar.foo", "") respectively. This is because the match-prefix construct fails if it doesn't match the specified string prefix exactly. Another way of handling this would be to replace the syntactic structures "::" and "//" with special marker characters in the translation step (that way the initial string can contain a colon or slash). In short, alternation is possible if the point of alternation can be recognised by either a set of possible characters or a pre-inserted marker. A4.3 Support for Optionality If we want to detect a port number in a URL if it is there (I suggest putting one in during the translation step however..), we can do something like this: ((eat-until ":/" ©) (match-prefix ":" &replace-with ".tcp_port_") (eat-until "/" ©) (eat-including "/" &replace-with "")) ((eat-until ":/" ©) (match-prefix "/" &replace-with ".tcp_port_80")) so, com.foo:8001/path generates ("tcp_port_8001.foo.com", "path") com.foo/path generates ("tcp_port_80.foo.com", "path") In short, optionality can be handled if the option can be recognised by either a set of possible characters or a pre-inserted marker. A4.4 Other Things? What else do we need? I'm not sure. I have by no means proved that this language is sufficient and don't know how I would go about doing that. However I believe the primary reason for needing functionality here is grandfathering; barring radical new technologies (which would probably come with their own radical new resolution mechanisms), new namespaces will likely make do with what is there if it is reasonably flexible. Consequently I am more concerned with specific counterexamples than I am with generalized ideas. That is, we look at the names we have to deal with and try to see how they would be handled by rulesets and by regexps. I would also like to see some info on the general problems of rewrite rules and what, if anything, we can do to overcome them. Some of the problems with rules are: 1) Confusion on how a result was obtained, typically due to successive applications of rules or to overly-capable rewrite languages. 2) Accretion of rules over time making a total rats nest of delegations. This is related to 1. I believe how a result is obtained can be determined by tracing the lookup process, similarly to the original NAPTR process. No doubt the trace may get confusing, and worse, long. As the number of servers in the path grows this can get to be a real problem. Also, the number of rulesets (or, equivalently, NAPTR regexps) returned from a given lookup may grow as delegations increase in number, meaning that more data is being transferred back to the client at a given step. Both of these issues concern me and affect both proposals in similar ways (however, as I indicated in section 3.0, the point of this proposal is more to make it easier to migrate forward later than it is to fix the long term problems with the resolution mechanism -- it is still really NAPTR at heart.) 3) Upstream changes invisibly breaking downstream results. 4) My site getting no requests / somebody else's requests due to upstream errors. As you point out, the regexp rules don't have that property of invisible breakage. Is the mgmt software solution indicated in section 2.0 a good enough fix for the ruleset proposal? Also, note that the limitations on rulesets severely limit the ways in which rulesets returned from the same lookup can interfere. It also makes it easier for mgmt software to be written to verify them. The point in starting this thread was to talk about rewrite rules and if they were a flaw or a feature. At the BOF, concern was raised about rewrite rules and someone said that if meeting the requirements requires the power of rewrite rules, and if we don't like that answer, do we want to change the requirements? Up to now I have not seen any convincing arguments that the requirements can be met without using rewriting. Nor have I seen any calls for us to change the requirements on URNs, at least not in any fashion that restricts the task to be accomplished. (If we were to simplify the requirements, it would be appropriate to reconsider the abilities of NAPTRs). I think there is a third alternative, which more accurately describes my proposal: _directly_ support all URN spaces with the exception of those for which the implementation requires a system of regexp rules. That is, balance the need to support URNs now and in the future with the relevant technical capabilities. For example, suppose sorting out a some subset of the URNs in use required a turing-complete language while the others could all be processed using a finite automaton; if the set of URNs gained by adding this functionality (and at the same time adding the potential to loop) is relatively insignificant, the best solution may be to escape and resolve them another way. If we assume for the moment that rewriting is necessary, then we get into discussing what capabilities are needed and how they will be expressed in the language that is used for the rewrite rules. This is where I think your proposal fits. It represents a noticeable reduction in the capabilities of the NAPTR system, since major rewriting can only happen at the namespace level when we canonicalize the URN. But that was the purpose of the proposal. True. I kind of like the notion of rules that eat off the front of the canonicalized name. However, I am not sure that only allowing that canonicalization to be performed at the namespace level is enough. Especially as collections of resources are delegated over time, I think we are going to see lots of URNs that are old URNs plus some new stuff. It seems impossible to canonicalize such things with the limited amount of information available at the namespace level. (See the stuff on "constructed OIDs" below for more info). [...] I am concerned about the "program" portion of the rewrite only being available at the namespace level. As a specific example, I came across "constructed OIDs" the day before yesterday. These seem to allow "other stuff" to be appended to an OID to make a new OID that does not follow the simple rules of integer name components. If we have an OID namespace, all it can possibly know about constructed OIDs is where the "other stuff" starts and ends. We can't provide a program at the namespace level that will canonicalize the order of all components. This gets at what me want to mean by name scheme specification (see sections A3, A4.1). Certainly the original specification of OIDs doesn't allow these additions? Do you have more information about constructed OIDs? >* It is already clear that there are some name schemes that cannot be >efficiently resolved by this new scheme and will need to be escaped >(just as some cannot be conveniently handled by NAPTR). To what >degree is it necessary to add flexibility beyond what is required to >grandfather in existing namespaces? Well, this sort of depends on what namespaces you want to grandfather in. :-) The regexps give us a set of tools to deal with new things as they arise, typically in the course of trying to grandfather in a new namespace. ^^^^^^^^^^^^^ By ``new namespace'' do you mean one that is recently invented or an existing namespace? I suspect it would be more likely that new namespaces be designed to fit into whatever structure is in place, rather than go through a lot of trouble to build an alternate resolution mechanism. It's easy to invent a namespace that can't be resolved effectively by NAPTR with regexps -- but is that our problem? Apologies for the length of this reply, but there were a lot of questions to answer. Thank you for your comments on the proposal, Regards, Lewis